Segmenting Conversations by Topic, Initiative, and Style
نویسنده
چکیده
Topical segmentation is a basic tool for information access to audio records of meetings and other types of speech documents which may be fairly long and contain multiple topics. Standard segmentation algorithms are typically based on keywords, pitch contours or pauses. This work demonstrates that speaker initiative and style may be used as segmentation criteria as well. A probabilistic segmentation procedure is presented which allows the integration and modeling of these features in a clean framework with good results. Keyword based segmentation methods degrade significantly on our meeting database when speech recognizer transcripts are used instead of manual transcripts. Speaker initiative is an interesting feature since it delivers good segmentations and should be easy to obtain from the audio. Speech style variation at the beginning, middle and end of topics may also be exploited for topical segmentation and would not require the detection of rare keywords. ACM SIGIR’01 Workshop on Information Retrieval Techniques for Speech Applications New Orleans, Louisiana, September 13, 2001
منابع مشابه
Supervised Topic Segmentation of Email Conversations
We propose a graph-theoretic supervised topic segmentation model for email conversations which combines (i) lexical knowledge, (ii) conversational features, and (iii) topic features. We compare our results with the existing unsupervised models (i.e., LCSeg and LDA), and with their two extensions for email conversations (i.e., LCSeg+FQG and LDA+FQG) that not only use lexical information but also...
متن کاملCompassionate Conversations
Staff engagement is much more than just a bonus in any organisation. CQC data shows that it is very clearly linked to positive results in both patient and staff outcomes (fewer complaints, improved safety, reduced sickness, fewer accidents, and more as per Michael West). Staff engagement may seem nebulous but is in fact measured routinely annually in the National Staff Survey. The problem is th...
متن کاملF0 correlates of topic and subject in spontaneous Japanese speech
This paper examines F0 correlates of morphologically marked grammatical functions, in particular topic and subject, in spontaneous Japanese speech. Our data consist of F0 measurements of 7,106 nouns in the CallHome Japanese corpus of telephone conversations [4]. We find that topics exhibit higher peak F0 than subjects, contradicting information-structure accounts which predict that topics, whic...
متن کاملA Hierarchical Bayesian Model for Topic Segmentation
Many streams of real-world data, such as conversations or body movements, consist of relatively coherent segments, each characterized by particular topics or controllers. Making sense of these data requires simultaneously segmenting the sequences and inferring the structure of the segments. We present a hierarchical Bayesian model that can be used to break a sequence of utterances or movements ...
متن کاملAn Initial Test Collection for Ranked Retrieval of SMS Conversations
This paper describes a test collection for evaluating systems that search English SMS (Short Message Service) conversations. The collection is built from about 120,000 text messages. Topic development involved identifying typical types of information needs, then generating topics of each type for which relevant content might be found in the collection. Relevance judgments were then made for gro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001